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Abstract 

This paper presents a useful theorem for non-Hnear transformations of the sum of independent, zero- 
mean, Gaussian random variables. It is proved that the linear regression coefficient of the non-linear 
transformation output with respect to the overall input is identical to the linear regression coefficient 
with respect to any Gaussian random variable that is part of the input. As a side-result, the theorem is 
useful to simplify the computation of the partial regression coefficient also for non-linear transformations 
of Gaussian-mixtures. Due to its generality, and the wide use of Gaussians, and Gaussian-mixtures, to 
statistically model several phenomena, the potential use of the theorem spans multiple disciplines and 
applications, including communication systems, as well as estimation and information theory. In this 
view, the paper highlights how the theorem can be exploited to faciUtate the derivation of fundamentals 
performance limits such as the SNR, the MSE and the mutual information in additive non-Gaussian 
(possibly non-linear) channels. 

Index Terms 

Gaussian random variables, Gaussian-mixtures, non-linearity, linear regression, SNR, MSE, mutual 
information. 



I. INTRODUCTION 

Non-Unear transformations of Gaussian random variables, and processes, is a classical subject of 
probability theory, with particular emphasis in communication systems. Several results are available in 
the literature to statistically characterize the non-linear transformation output, for both real HI, 121, 131, 
S, 121, m, (91, El and complex El, El, El Gaussian-distributed input processes. 
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If the input to the non-Unear transformation is the sum of two, or more, Gaussian random variables, 
then the overall input is still Gaussian and, consequently, the statistical characterization can still exploit 
the wide classical literature on the subject. For instance, a key point is to establish the equivalent input- 
output linear-gain (or linear regression coefficient) of the non linearity. Anyway, if the interest is to 
infer only a part of the input by the overall output, and to establish a partial regression coefficient (or 
linear-gain) with respect to this part of the input, it is necessary to compute multiple-folded integrals 
involving the non-linear transformation. This task is in general tedious and, sometimes, also prohibitive. 
This paper proves that, if the non-linear transformation input is the sum of zero-mean, independent, 
Gaussian random variables, all the partial regression coefficients are identical, and equal to the overall 
input-output regression coefficient. Thus, the theorem highly simplifies the computation of the partial 
regression coefficient, which can be performed by a single-folded integral over the Gaussian probabihty 
density function (pdf) of the overall input. 

To the best of the author knowledge, the theorem is new, or at least well hidden in the technical 
literature. Due to its potential usefulness in several disciplines, it deserves to be highlighted to the 
scientific community, which is the major scope of this paper. As a valuable side-product, the theorem 
lets to simplify the computation of the partial linear-gain, also when the non-linearity input is the sum 
of Gaussian-mixtures |25|. Gaussian-mixtures are widely used in multiple disciplines, such as to model 
electromagnetic interference ifTTl . images background noise ifTOl . financial assets returns |[20il . and, more 
generally, to statistically model clustered data sets. Actually, it is the similarity of the theoretical results 
for suboptimal estimators of Gaussian sources impaired by a Gaussian-mixture (impulsive) noise in 121], 
with those of non-linear transformations of Gaussian random variables in fWl, |fT6l, flSl, that led to 
conjecture the existence of the theorem, which is proved and analyzed in this paper. Throughout the 
paper E{-} is used for statistical expectation, interchangeably with Exi...Xn{-}, which is used, when 
necessary, to highlight the (joint) probability density function pdf /xi,....Xjv(") involved in the expectation 
integral. 

II. Statistical Model 

Lets consider the random variable 

Y = X + N, (1) 

which is the sum of two independent zero-mean real random variables, X and N, distributed according 
to a Gaussian pdf, with variances aj^ and aj^, as expressed by fx{x) = G{x; aj^), /Ar(n) = G{n; a%), 
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X 




Z = g{X + N) 



N 



Fig. 1. The statistical model 



and 

It is well known, that also Y is Gaussian distributed |[23l with /y (y) = G{y; ay), and cry = a'j(+(^%- Lets 
also assume that Z is a non-linear transformation of Y, as graphically shown in Fig. 1, and summarized 
by 

Z = g(Y) = g{X + N). (3) 

III. INPUT-OUTPUT LINEAR REGRESSION 

For any Y and any non-linear transformation g{-), the output random variable Z can be decomposed 
as the sum of a scaled version of the input Y with an uncorrected distortion term Wy, as expressed by 

Z = g{Y) = kyY + Wy, (4) 

where 



E{ZY} 
E{Y^} 



ky = J,,,,; (5) 



is the input-output linear gain, or regression coefficient, that grants the orthogonality between Y and Wy, 
i.e., E{YWy} = 0. 

The coefficient ky in (|5]), is the same coefficient that appears in the Bussgang theorem HI, which 
extends ©, preserving the orthogonality of the distortion, only to special random processes, such as the 
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Gaussian ones. Indeed, for the class of stationary Bussgang processes ifTOl . |[T2l . it holds true that 

Z{t)=kyY{t)+Wy{t), (6) 

where 



ky = l^^T^ = ^-krTPTTT^^ >Vt,Vr, (7) 



Rzy{0) ^ E{Z{t + T)Y{ t)} 
Ryy{0) E{Y^t)} 

Rzy{t) = E{Z{t)Y{t + r)} is the classical cross-correlation function for stationary random processes, 
and RwyY (t") = 0. For instance, the Bussgang theorem can be exploited to characterize the power 
spectral density of the output of a non linearity with Gaussian input processes. This fact induced an 
extensive technical literature, with closed form solutions for the computation of ky for a wide class of 
non liner distortions g{-), as detailed in Q, 0, HI, H, Q, H, CH, for real Gaussian inputs, and 
in lfT4l . ifTSl . |[T6l for complex Gaussian inputs. 

The computation of ky requests to compute a single-folded integral, as expressed by 

+ 00 

ky = l-EY{g{Y)Y} = j^ j yg{y)G{y; a^dy, (8) 



-oo 



Py ^ ' ^ Py 

where, Py = is used in the following for notation compactness. Note that, dH) is in general 

much easier to compute than its equivalent double-folded integral 

1 



ky = —ExN{g{X + N){X + N)} 



+ 00 +00 

1 



Py 

— oo — oo 



X + n)g{x + n)fx{x)f]\f{n)dxdn. (9) 



Additionally, it is also possible to express the non-linearity output as a linear regression with respect 
to a single input X, or N, as expressed by 

Z = giX + N) = kxX + Wx, (10) 

Z = g{X + N) = knN + Wn, (11) 

where 

_ E{ZX} _ ExN{g{X + N)X} 

" E{X^] ~ Px ' ^ ^ 

_ E{ZN] _ ExN{g{X + N)N] 
" " E{N^} ~ Pn ' ^ ^ 
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and E{XWx} = E{NWn} = 0. The relationship between the three regression coefficients ky, kx, and 
kn, for generic random variables, e.g., also non Gaussian, is summarized by 

Pyky = ExN{g{X + N){X + N)} 

= ExN{g{X + N)X} + ExN{g{X + N)N} 

= Pxkx + PNkn, 

which highlights that the linear gain of the overall input is a weighted sum of the linear gains of each 
input component, as expressed by 

u = ^ h _| h (\4) 

^ Px + Pn + 2E{XN} Px + Pn + 2E{XN} 

Note that, for special cases when kx = kn, and X, N are orthogonal (i.e., E{XN} = 0), then ([T4l) 

induces also ky = kx = kn- 

IV. EQUAL-GAIN THEOREMS 

A case when ky = kx = kn, for any non-linear transformation g{-), is summarized by the following 
Theorem [T] and Lemmas: 

Theorem 1: If X and N are two independent zero-mean Gaussian random variables, Y = X + N, 
and g{ ) any non-linear single-valued regular function, then 

E{ZY] _ E{ZX} _ E{ZN} 



Oy (Jx (Tx 



k. (15) 



Proof: See Appendix. 



Lemma 1: IfY = OxX + OnX, with ax, an G R. then 

E{ZY} _ 1 E{ZX} _ 1 E{ZN} 

Proof: By Theorem 1 with X = axX and N = an.N- ■ 
J 

Lemma 2: If Y = J2 ^jXj, otj € R, and Xj are independent zero-mean Gaussian random variables, 
then 

E{ZY} _ 1 EjZXj} 

Proof: By Theorem 1 and Lemma 1 with X = aiXi and N = Yl, '^jXj- ■ 
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In general, by equations (01), (ITOl ). and ([TT]) . it is possible to observe that, 

E{WyX} = E{{Z - ky{X + N))X} 

kyE{X^}-ky 



E{ZX] - kyE{X'^] - kyE{NX} 



(16) 

= k^E{X^} + E{W^X} - kyE{X^} 
= {kx - ky)Px, 

and analogously 

E{WyN} = {kn - ky)PN. (17) 

Due to the fact that in the derivations of ( fT6b and ( fTTl ). it is only necessary to assume X, N to be 
orthogonal (i.e., E{NX} = 0), and not necessarily Gaussian, it is demonstrated the following more 
general theorem 

Theorem 2: If X and N are two orthogonal random variables, Y = X + N , g{-) is any single-valued 
regular function, by the definitions d?]), diOD . and diil) 

E{WyX} = E{WyN} = iff ky = k,=kn. (18) 



The property E{WyX} = E{WyN} = in Theorem 2, highlights the key element that distinguishes 
independent zero-mean Gaussian random inputs, with respect to the general situation, when X and N are 
characterized by arbitrary pdfs. Indeed, for zero-mean Gaussian inputs, by means of Theorem 1 and the 
sufficient condition in Theorem 2, the distortion term Wy is orthogonal to both the input components X 
and N, while in general it is orthogonal only to their sum Y = X + N. This means that, in the general 
case, it is only possible to state that 

E{WyX} = -E{WyN} / 0, (19) 

which is equivalent to link the tree linear gains by (fT4l) . rather than by ( fTSl ). which is just a special case. 
Another special case is summarized in the following 

Theorem 3: If X and N are two independent zero-mean random variables with identical probability 
density functions fx{-) = fN{-), Y = X + N, g{-) is any single-valued regular function, then in (|4]l, 
(O, and (E) 

ky = kx = kn- (20) 
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Proof: By observing the definitions of and kn in ([T2l ) and ([T3] ). it is straightforward to conclude 
that kx = kn, when fx{-) is identical to /Ar(-) (note that also a\ = a^^) and, consequently, due to 
E{XN} = E{X}E{N} = 0, dlOll follows from (O. ■ 



V. A SIMPLE INTERPRETATION 

An intuitive interpretation of the cases summarized by Theorem 1, Theorem 2, and Theorem 3 is that 
the non-linear function g{-) statistically handles each input component in the same way, in the sense that it 
does not privilege or penalize any of the two, with respect to the uncorrelated distortion. In order to clarify 
this intuitive statement, lets assume that X and N are zero-mean and uncorrelated, i.e., E{XN} = 0, 
g{-) is an odd function, i.e., g{y) = g{—y), and that the goal is to linearly infer either X, or N, or their 
sum Y = X + N , from the observation Z. Obviously, in this simplified set-up, also Z is zero-mean, and 
consequently the best (in the MMSE sense) linear estimators of, X, N, and Y are expressed by |[24l 

2 

X{Z) = ^pxzZ = kx^Z, (21) 

2 

N{Z) = —PNZZ = kn^-Z, (22) 
az Oz 

2 2 

Y{Z) = —pYzZ = ky'^^^^Z = X{Z) + N{Z), (23) 

(Tz (Jz 

where pxz = E{XZ} /aYC^z, Pnz, and pyz are the cross-correlation coefficients for zero-mean random 
variables. Note that, as well known 1241, the equaUty Y{Z) = X{Z) + N{Z) in (|23l) holds true also 
when ky ^ kx kn- Equations (|2TI)- (|231) highlight that, if the two zero-mean inputs X and N equally 
contribute to the input in the average power sense, i.e., when a\ = aj^, and their non-Gaussian, and non- 
identical pdfs fx{x), and fNin), induce kx > kn (or kx < kn), then X (or N) appears less undistorted 
in the output Z and, consequently, it gives an higher contribution to the estimation of the sum, by X (or 
N). 

VI. COUNTER EXAMPLE AND CONJECTURES 



This section describes a possible way to test if the property in (1201 ) may hold true, or not, with respect 
to a wider class of pdfs. To this end, lets assume that X is still distributed as a Gaussian pdf, while N 
is a zero-mean Gaussian-mixture, as expressed by 

Mn) = jzM\{n) = E ^iL=e"^, (24) 

;=0 1=0 V^TTCTjv; 
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where aj^ = J2 Pi^'n i is the variance, and A = 1> i-^-' A ^ are the probabiUty-masses associated 

1=0 ' 1=0 

to a discrete random variable, in order to grant that fNin) is a proper pdf with unit area. 

This scenario models a wide class of symmetric, zero-mean random variables, and represents a way 
to control how much N departs, from a Gaussian distribution, depending on the choice of L and 
For instance, this quite general framework includes an impulsive noise N, characterized by the 
Middleton's Class-A canonical model IITtI . where L = oo, /3/ = are Poisson-distributed weights, 

(^"n I = tt^Cat, with A and F the canonical parameters that control the impulsiveness of the noise 



llTSl . Conversely, observe that when L = 0, and /3o = 1, the hypotheses of Theorem 1 hold true, and 
consequently ( [201 ) is verified. 

If X and N are independent, Y = X + N i^, also distributed as a Gaussian-mixture, as expressed by 



fviv) =fN{y)*fx{y) 

L L (25) 

" ■ - ry, 



E miy)*fx{y)] = E f^iG{Y;al 



1=0 1=0 

due to the fact that the convolution of two zero-mean Gaussian functions, still produces a zero-mean 
Gaussian function, with variance equal to ayi = o"^ + ct^^. Thus, the linear regression coefficient ky 
can be expressed by 

ky = ME^pXl = l_j2 p,EyA9{Y)Y}, (26) 
o-y 1=0 

where Yi = X + Ni stands for the l-th "virtual" Gaussian random variable that is possible to associate 
to the Z-th Gaussian pdf in ( |25l ). Equation ( |26l ) suggests that, in this case, ky can be interpreted as a 
weighted sum of other L + 1 regression coefficients 

,(0 ^ (27) 



as expressed by 



1=0 ''Y 



Each gain ky^ in (|28] ). is associated to the virtual output Zi = g{Yi), generated by the non-linearity g(-), 
when it is applied to the Gaussian-distributed virtual input Yi. Analogously 



k, = -^ExNigiX + N)X} = J2 f3ik^J\ (29) 

where 



;=o 



4') = ^--^i^i^^^)^}. (30) 
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Due to the fact that X, Ni, and Yi = X + Ni, satisfy the hypotheses of Theorem 1, it is possible to 
conclude that 

4'^ = fe«, (31) 

which plugged in (1281 ) leads to 

^s/ = E^A4^^- (32) 

1=0 

By direct inspection of (l32l ) and (|29l ). it is possible to conclude that ky ^ k^, as soon as L > 0, for any 
value of the weights /3/, and any non-linear function g{-). Thus, it is reasonable the following 

Conjecture 1: If X ^ A/'(0, cr'x), and N is an independent random variable, Y = X + N , g{-) is any 
single-valued regular function, in the definitions d?]), diOl ). and ([77]) 

ky = k^ = kn iff N^Af{0,a%), 
i.e., if and only if also N is zero-mean Gaussian-distributed. 

A stronger conjecture is the following 

Conjecture 2: If X and N are two random variables, Y = X + N, g{-) is any single-valued regular 
function, in the definitions d?]), diOl ). and f liil ) 



ky — kx — kn ijj 



X ~AA(0,a|), E{XN} =0 
i.e., if and only if the two random variables are both zero-mean, Gaussian, and independent. 

The conjectures, which as such are left without a proof, imply that the hypotheses of Theorem 1 are 
not only sufficient, but also necessary. 

VII. COMPUTER-AIDED SIMILATIONS 

This section reports some computer-aided simulation, to confirm the Theorems, and also to give some 
further strength to the Conjectures. To this end, it is considered a simple soft-limiting non-linearity, as 
expressed by 

, . \ y Ay\<yth 
9{y) = < ■ (33) 

[ ythsign{y) , \y\ > yth 

The clipping threshold has been fixed as yt^ = 1, and the average input power is always set to Py = 10, 
in order to evidence the non-linear behavior, by frequently clipping the input Y = X + N . Samples of 
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the random variables X and N have been generated according to either a zero-mean Gaussian (i.e., with 
/(a) = G(a;(T^)) or to a zero-mean Laplace pdf (e.g., with analogous notation /(a) = -L(a; 2/A^) = 
0.5Ae-^l"l). The regression coefficients ky, k^, and k^ have been estimated by substituting each expected 
value in ((S), ([T2l ) and ([T3] ). with the corresponding sample-mean over 10^ samples. 

Fig. [2]-Fig. |5] plot the linear-regression coefficients versus the mean square ratio pp = Px/{Px +Pn), 
which represents the power percentage of Y = X + N that is absorbed by X, when X and N are 
independent. 

Fig. [21 where the input of the soft-limiter is the sum of two independent zero-mean Gaussians, confirms 
Theorem 1, with all the three regression coefficients that are identical, independently of how the input 
power Py = Px + Pn is split between X and N. 

Conversely, in Fig. [3j the input is the sum of two (zero-mean) independent Laplace random variables, 
and ky kx kn- However, when pp = 0.5, i.e., when the input power Py is equally split between X 
and N, the three coefficients are equal, as predicted by Theorem 3. 
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Fig. 2. Linear regression coefficients versus the input power ratio, when the inputs are independent and Gaussians, i.e., 
X ~ G(0, Px), N ~ G(0, Pn), Py = 10. 
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In Fig. m where X is zero-mean Gaussian while N is an independent zero-mean Laplacian, it is clearly 
shown that ky / kx / kn for any pp, confirming Conjecture 1. Actually, the three coefficient tend to be 
equal when pp ^ 1, because in this case the Gaussian X is dominant, y = X + is almost Gaussian 
and the situation tends to that one in Fig. |2] 

In Fig. |5j differently from Fig. |2j the two Gaussian inputs X and N are not independent, and they are 
correlated with a correlation coefficient pxN = 0.3. It is observed that, in this case, all the regression 
coefficients are different, except when pp = 0.5, i.e., when Px = Pn and each variable absorbs a fraction 
equal to (1 — 2pxN)/'^ of the total power Py. Note however that, when Px = Pn, ky < k^ due to ([141) . 
which becomes ky = kx/{l + Pxn)- Additionally, it is possible to observe that ky in Fig. |5] should be 
equal to the value in Fig. [2j because the non-linearity in both cases has a Gaussian input Y , with the same 
power Py = (7y = 10. Another interpretation of this result is the following: due to the correlation pxN, 
it is possible to express each separate component, for instance N, as a function of the other one, i.e., 
N = pxnX + e, with e ~ G(0, o-^), e independent of X, and a1 such that Py = {1 + pxN)'^'y\ + (T^. 
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Thus, for Y = U + e, U = {1 + pxn)X the hypotheses of Theorem 1 are satisfied and consequently 
ky = ku = ke, where by straightforward substitutions fc„ = E{ZU} / Pu = kx/{f + Pxn)- 

VIII. INFORMATION AND ESTIMATION THEORETICAL IMPLICATIONS 

This section is dedicated to identify a (non exhausting) framework that is pertinent to communications, 
information theory, and estimation theory, where Theorem 1 can find a useful application. Actually, 
the model in Fig. [1] is quite common in several communication systems, where X may represent the 
useful information, N the noise or interference, and g{-) either a distorting non-linear device (such as 
an amplifier, a limiter, an analog-to-digital converter, etc.), or an estimator/detector that is supposed to 
contrast the detrimental effect of on X. 

Typical parameters, to asses performance of such a non-linear communication system, are the signal- 
to-noise power ratio (SNR), the maximal mutual information (capacity), and the mean square estimation 
error (MSE), whose link has attracted several research efforts in the last decade (see |[28ll . |[29ll and 
references therein). 
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Fig. 5. Linear regression coefficients versus the input power ratio, when the inputs are correlated Gaussians, i.e., X ~ G(0, Px), 
N ~ G(0, Pn), pxN = 0.3, Py = 10. 



A. SNR considerations 

In order to define a meaningful SNR, it is useful to separate the non-linear device output as the sum 
of the useful information with an uncorrelated distortion, as in dTOl ). For simplicity, we assume in the 
following that all the random variables are zero-mean, i.e., Px = crx^- Thus, the SNR at the non-linearity 
output, is expressed by 



^(^-)"" 
where the second equality is granted by the orthogonality between X and Wx- 

In the general case, in order to obtain a closed form expression for (l34l ). it would be necessary to 
solve the double folded integral in ([T2l ). for the computation of k^. However, if X and N are zero-mean, 
independent, and Gaussian, by Theorem 1 the computation can be simplified by exploiting that kx = ky 
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and, consequently, the computation of the SNR would request to solve only single-folded integrals, e.g., 
^ and EY{g'^{Y)}. Note that, in this case also Y = X + N would be Gaussian and, consequently, the 
computations of these single-folded integrals can benefit of the results available in the literature for a 
wide class of non-linearities g{-) HI, d, H, HI, d. El, 113, HSl, |[T6l . 

Actually, it could be argued that the SNR may be also defined by exploiting ^ rather than ([8]l. Indeed, 
by rewriting Q as 

Z = g{X +N)= kyX + kyN + Wy (35) 



another SNR could be expressed as 



SNR,, 



klE{X^} _ klal 



klE{N'^} + E{WI} i?y{<?2(y)} - klal 
( EY{g^{Y)} y 

\ ) ■ 



(36) 

Note that. Theorem 1 states that the two SNRs in (1361 ) and (l34l ) are identical if X and N are zero-mean, 
independent, and Gaussian. Conjecture 2, claims that this is also the only situation where it is correct to 
use (l36l ) in place of (l34l ). 

When is non-Gaussian, it is possible to approximate its continuous pdf by a Gaussian-mixture, 
with infinite accuracy (26\. Thus, the example in Section |Vll represents a wide class of zero-mean and 
symmetrical noise pdfs, summarized by he Gaussian-mixture in (l24l ). where k^ 7^ ky, and ( [34l ) should be 
used instead of (l36l ). However, despite (l36l ) cannot be used to compute the SNR, as detailed in Section 
Theorem 1 turns to be useful to compute k^, due to the fact that 



f^x — 2_^Pli^x ; '^x — '^y — 2 

1=0 ^Y,l 



(37) 



B. MSB considerations 



The definition of the error at the non-linearity output may depend on the non-linearity purpose. If the 
non linearity g{-) represents an estimator of X given the observation Y = X + N, as expressed by 

X = g{X + N) = k,X + W,, (38) 

the estimation error is defined as 

e = X-X = {k,-l)X + W,. (39) 
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Exploiting the uncorrelation between X and N, 

E{W^} = EvigHy)} - klE{X^}, (40) 
the MSE at the non-hnearity output is defined by 

MSE = E{e'^}= {k^ - lfE{X^} + E{W^} 

= {l-2k^)E{X^} + E{g\Y)}. (41) 

However, looking at ( [38] ) from another point of view, it is also possible to consider g{-) as a distorting 
device that scales by the useful information X, that is (|4TI ) represents the MSE of a (conditionally) 
biased estimator. In this view, it is possible to define an unbiased estimator Xu = X /kx and the associated 
unbiased estimation error as 

eu = X/kx-X = Wx/kx, (42) 

whose mean square-value is expressed by 

MSE„= E{el} = E{Wl}/kl 

= Ey{g\Y)]/kl - E{X^]. (43) 

Thus, it is straightforward to prove that, for a given information power E{X'^}, the non-linearities that 
maximize the two MSE are different, as expressed by 

5mmse(-) = arg mill [MSE] = arg min [log(MSE)] 

f(-) 9{-) (-44) 

= arg min [E{g'^{Y)} /kx\ , 

and 

ffu-mmse(-) = arg mill [MSE„] = arg min \E{g'^{Y)}/kl \ . (45) 
9{-) a{-) ^ ^ 

The first criterion corresponds to the classical Bayesian minimum MSE (MMSE) estimator, that is 
5mmse(^) = ^y mcans of ( [341 ) and ([45] ). the second criterion, which is the unbiased-MMSE 

(UMMSE) estimator, is equivalent to the maximum-SNR (MSNR) criterion. Note that k^ depends on 
g{-) by ([T2I) and consequently, in general 

/ \ / ffmmse(') (Af^^ 

Indeed, the right-hand term in ([46l ) is a (conditionally) unbiased estimator, but not the optimal one, 
because it has been obtained by first optimizing the MSE, and by successively compensating the biasing 
gain, while 5u-mmse(^) should be obtained the other way around, as expressed by ([42] ) and ( [45] ). The two 
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criteria tend to be quite similar when the functional derivative « in the neighborhood of the 

optimal solution gmmse{-)- 

Actually, the MMSE and the MSNR criteria are equivalent from an information theoretic point of 
view only when ^l") linear [28 1, in which case ^u-mmse 

(•) is equivalent to right-hand side of (l46l) . For 
instance, this happens when X and are both zero-mean, independent, and Gaussian as in Theorem 1, 
and consequently [|24ll 

2 2 

^mmse = 5mmse (Y) = /f^ , F = , {X + A^) (47) 

is just a scaled version of the UMMSE, 

X-mmse = 5u-mmse {Y)=Y = X + N. (48) 

By noting that the SNR is not influenced by a scaling coefficient, because it affects both the useful 
information and the noise, it is confirmed that for linear g{-) the MMSE optimal solution is also MSNR 
optimal |[28l . 

Conversely, when N is not Gaussian distributed, its pdf may be, or be approximated, by a Gaussian- 
mixture as in (l24l ). In this case, analogously to the consideration for the SNR computation. Theorem 
1 turns to be useful to compute k^, and thus the MSE in (|4T] ). and (l43l ). by the single-folded integrals 
involved in (37), rather than by the double-folded integrals in (l30l ). The reader interested in this point, may 
find a deeper explanation in f22l|, where these considerations have been fully exploited to characterize the 
performance of MMSE and MSNR estimators for a Gaussian source impaired by impulsive Middleton's 
Class-A noise. 

C. Capacity considerations 

The computation of the capacity of the non-linear information channel X ^ Z = g{X + N) is in 
general prohibitive, due to the complicated expression for the pdf of the disturbance component in (ITOl ) 
or (|35] ). Actually, when the noise is non-Gaussian it is even difficult to compute in closed form the 
mutual information I{X — )• Y), in the absence of the non-linearity g{-), and only bounds are in generale 
available 17]. For instance, when the noise N is the Gaussian-mixture summarized by (l24b . it is even 
difficult to compute its differential entropy h{N), which can only be bounded as suggested in [127 J . In 
this case, the bounds in [[271 can be used for a simple upper bound of the mutual information I{X, Z), 
which is based on the information processing inequality [[30l . as expressed by 

I{X, Z) = I{X ^ Z) < I{X, Y). (49) 
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When X is Gaussian, a simple lower-bound for I{X, Z) is provided by the AWGN capacity of (fTOl ) and 
(l35l) . when the disturbance is modeled as (the maximum-entropy IH) zero-mean Gaussian distributed 
noise with variance equal to E {Z"^] — k'^cr'x and E {Z"^] — kya\, respectively. Thus, exploiting ([TOl l 
and (l34l) . it is possible to conclude that 

/(X, Z) > d^WGN) ^ 1 i^g(i ^ sNR,)^ (50) 

while, by exploiting ( [35l ) and ( [36l ). it would be possible to conclude that 

I{X, Z) > cl^WGN) ^ 1 i^g(^ ^ sj^j^^)^ (5 J) 

By Theorem 1, the two lower-bounds are equivalent if X and N are zero-mean independent Gaussians. 
Otherwise, the correct SNR is (l34l) and the correct lower bound is (l50l) . Note that, in both cases, either 
when N is Laplace distributed and independent of X (see Fig. |4l), or when it is Gaussian distributed 
and positively correlated with X (see Fig. |5]), kx > ky and consequently by (l34l l and (l36l) . (7^-^^*^^) > 

It is also possible to derive a bound for the mutual information of the additive channle Y = X + N 
by exploiting the interplay between MMSE and mutual information, as suggested in f29\, where it is 
proved that, for non-Gaussian additive channels, 

I{X, Y) > h{X) - i log (27re MMSE) . (52) 

Actually, (l52l ) can be also readily derived by the corollary of Theorem 8.6.6 in 1,30,1 . For a Gaussian 
source X, (l52l ) simply becomes 



I{X,Y) >llog I -^—\, (53) 



2 \^MMSE J ' 
where by exploiting (|4TI) it is possible to substitute 

MMSE = (1 - 2A:») ax^ + i?y{5Lse(^)}- (54) 
Also in this case, if the noise N can be modeled, or approximated, by the Gaussian-mixture in (l24l ). 
Theorem 1 turns to be useful because it allows to compute the gain k^^"^^^^ by (l37l) . where only single- 
folded integrals are requested. However, it is worth to point out that in this case (|37] ) and Eyig'^iY)} 
have to be computed with g(Y) = 5'mmse(^)> which is characterized by a rather involved expression 
II22I . which prevents closed form solutions. Thus, the computation of the lower bound in (|54l ) requests 
either (single-folded) numerical integration techniques, or to express 5'mmse(^) as a series expansion by 
opportune functions (polynomials, hermite, etc.) that admit closed form expressions for their averages 
over Gaussian pdfs (see 0, lim . |[T5l and references therein). This is however out of the scope of this 
paper, and a possible subject for further investigations. 
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IX. CONCLUSIONS 

The main contribution of this paper has been to prove and analyze a general and theoretically interesting 
theorem for non-linear transformations of the sum of zero-mean independent Gaussian random variables. 
Due to the widespread use of Gaussian random variables, the theorem can be useful in several fields, which 
include estimation theory, information theory, and non-linear system characterization. As a side-result, the 
theorem can be used to simplify the computations involved in the analysis of non-linear transformations 
of Gaussian-mixtures. Finally, the paper has highlighted its usefulness for the computation of the SNR, 
the MSB and bounds on the mutual information, associated with communication systems dealing with 
non-linear devices and estimators. 

Appendix 

Proof of Theorem 1: it is assumed that g{-) is a regular function, in the sense it admits a power 



series representation, i.e., a Mc-Laurin expansion g{y) = CpO^, with Cp 

p=0 

Thus, by the definition of the input-output linear-regression coefficient in ([Till 



a=0 



hcxD +00 



kx = -^ [ X f g{x + n)fN{n)dnfx{x)dx 

(7 J J 



-00 —00 
hoo +00 



-jT I X I g{a)fN{a - x)dafx{x)dx 



X 



-00 —00 
+00 +00 



~ $Z / ^ / Ci^G^a — x;a'jf)dafx{x)dx 
" —00 —00 

= ^-Y / xmp{x)fx{x)dx, (55) 

where m^p'\x) = Ea\x{ct^}, is the non-central moment of order p of a Gaussian random variable with 
mean-value equal to ma|^ = m^^\x) = x. The non-central moments can be computed by exploiting 
their well known relationship with the central moments ^u^"^ = E{{a — m^i^)''}, expressed by |[23l 



m. 



k=0 

and the fact that for a Gaussian random variable, with variance cr^, all the odd central moments are null, 
as expressed by 

{k-iy.la^ ,k = 2l, /eN 
, /c = 2/ + 1, / e N, 
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where(n)!! = n ■ (n — 2) • (n — 4) • 1, stands for the so called double-factorial of an integer number. 

Thus, equation (1551) becomes 

oo „ b/2j / \ +- 

= E 4 E U; - / -'"''^'G(0; 

oo Lp/2J / \ 

= E;^E 2/ (2^-l)!!-^VtWi- (56) 

Due to the fact /^p_2«+i ~ ^ when p — 2/ + 1 is odd, that is when p = 27 is even, (1561 ) highlights 
the well known property that for zero-mean Gaussian inputs, the linear-gain (or regression coefficient) is 
imposed only by the odd part of the non-linearity, because only the terms in the series with p = 27 + 1 
are different from zero. Thus, it is possible to conclude that 



7=0 '^X 1=0 



E -2.-m4" E f %| ') (21 - 1)!! (27 + 1 - 20!! . (57) 



7=0 1=0 
Analogously, due to the symmetry of the problem. 



kn = E C27+i4^E (%| ^) (2^ - 1)'! + 1 - 2/)!! (^)" . (58) 
In order to prove Theorem 1, it is sufficient to prove = kn, which substituted in ([141 ). leads to 



ky — kx — ki^. 

To this end, observe that (l58l ) can also be written as 

00 



fcn = E ^27+i4" E M m - 1)!! (27 + 1 - 2/)!! (^)''''' ■ (59) 



7=0 1=0 

and by defining q = j — I 



Exploiting the property Q) = and renaming for convenience the summation index q by I, ( [6OI ) 

becomes 

= f C27-m4 E (2]+ 1) (27 - 2/ - 1)!! (2/ + 1)!! (^)'' . (61) 
By exploiting n! = n!!(n — 1)!!, it is straightforward to verify that 

(2/ 1 1^) + ^^"^^^ - 2/ - 1)!! = P^^l ^) - 1)"(27 - 2/ + 1)!! (62) 
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Consequently, coincides with dSTJl and, due to the fact that E{XN} = E{X}E{N} = by 
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